Search CORE

4 research outputs found

DMR API: Improving cluster productivity by turning applications into malleable

Author: Beltrán Vicenç
Iserte Agut Sergio
Mayo Gual Rafael
Peña Monferrer Antonio José
Quintana Ortí Enrique Salvador
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

[EN] Adaptive workloads can change on-the-fly the configuration of their jobs, in terms of number of processes. To carry out these job reconfigurations, we have designed a methodology which enables a job to communicate with the resource manager and, through the runtime. to change its number of MPI ranks. The collaboration between both the workload manager-aware of the queue of jobs and the resources allocation-and the parallel runtime-able to transparently handle the processes and the program data-is crucial for our throughput-aware malleability methodology. Hence, when a job triggers a reconfiguration, the resource manager will check the cluster status and return the appropriate action: i) expand, if there are spare resources; ii) shrink, if queued jobs can be initiated; or iii) none, if no change can improve the global productivity. In this paper, we describe the internals of our framework and demonstrate how it reduces the global workload completion time along with providing a more efficient usage of the underlying resources. For this purpose, we present a thorough study of the adaptive workloads processing by showing the detailed behavior of our framework in representative experiments. (C) 2018 Elsevier B.V. All rights reserved.Iserte Agut, S.; Mayo Gual, R.; Quintana Ortí, ES.; Beltrán, V.; Peña Monferrer, AJ. (2018). DMR API: Improving cluster productivity by turning applications into malleable. Parallel Computing. 78:54-66. https://doi.org/10.1016/j.parco.2018.07.006S54667

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Repositori Institucional de la Universitat Jaume I

RiuNet

SLURM Support for Remote GPU Virtualization: Implementation and Performance Study

Author: Castello Gimeno Adrián
Duato Marín José Francisco
Iserte Agut Sergio
Mayo Gual Rafael
Prades Gasulla Javier
Quintana Ortí Enrique Salvador
Reaño González Carlos
Silla Jiménez Federico
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/10/2014
Field of study

© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.SLURM is a resource manager that can be leveraged to share a collection of heterogeneous resources among the jobs in execution in a cluster. However, SLURM is not designed to handle resources such as graphics processing units (GPUs). Concretely, although SLURM can use a generic resource plugin (GRes) to manage GPUs, with this solution the hardware accelerators can only be accessed by the job that is in execution on the node to which the GPU is attached. This is a serious constraint for remote GPU virtualization technologies, which aim at providing a user-transparent access to all GPUs in cluster, independently of the specific location of the node where the application is running with respect to the GPU node. In this work we introduce a new type of device in SLURM, "rgpu", in order to gain access from any application node to any GPU node in the cluster using rCUDA as the remote GPU virtualization solution. With this new scheduling mechanism, a user can access any number of GPUs, as SLURM schedules the tasks taking into account all the graphics accelerators available in the complete cluster. We present experimental results that show the benefits of this new approach in terms of increased flexibility for the job scheduler.The researchers at UPV were supported by the the Generalitat Valenciana under Grant PROMETEOII/2013/009 of the PROMETEO program phase II. Researchers at UJI were supported by MINECO, by FEDER funds under Grant TIN2011-23283, and by the Fundacion Caixa-Castelló Bancaixa (Grant P11B2013-21).Iserte Agut, S.; Castello Gimeno, A.; Mayo Gual, R.; Quintana Ortí, ES.; Silla Jiménez, F.; Duato Marín, JF.; Reaño González, C.... (2014). SLURM Support for Remote GPU Virtualization: Implementation and Performance Study. En Computer Architecture and High Performance Computing (SBAC-PAD), 2014 IEEE 26th International Symposium on. IEEE. 318-325. https://doi.org/10.1109/SBAC-PAD.2014.49S31832

Crossref

RiuNet

High-throughput Computation through Efficient Resource Management

Author: Iserte Agut Sergio
Publication venue: 'Universitat Jaume I'
Publication date: 01/01/2018
Field of study

This proposal addresses, from two different approaches, the improvement of data centers productivity through an efficient resource management. On the one hand, the combination of GPU remote virtualization technologies with workload managers in HPC clusters and cloud computing environments. On the other hand, job reconfigurations in terms of varying its number of processes during the execution. Performance evaluations reveal a non-negligible improvement not only in the throughput, but also, in the job waiting time and in the energy consumption.Esta propuesta aborda, desde dos enfoques distintos, la mejora de la productividad de centros de procesamientos de datos mediante una gestión eficiente de los recursos. Por un lado, la combinación de tecnologías de virtualización remotas de GPUs junto con gestores de recursos en clústeres HPC y entornos de computación en la nube. Por el otro lado, la reconfiguración de trabajos en términos de modificar el número de procesos durante la ejecución. La evaluación de prestaciones revela un incremento no sólo en la productividad, sino también en el consumo energético.Programa de Doctorat en Informàtic

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Tesis Doctorals en Xarxa

Repositori Institucional de la Universitat Jaume I

SLURM Support for Remote GPU Virtualization: Implementation and Performance Study

Author: Castello Gimeno Adrián
Duato Marín José Francisco
Iserte Agut Sergio
Mayo Gual Rafael
Prades Gasulla Javier
Quintana Ortí Enrique Salvador
Reaño González Carlos
Silla Jiménez Federico
Publication venue
Publication date: 01/01/2014
Field of study

Queen's University Belfast Research Portal

Crossref

RiuNet